Method for performing multi-caching on data sources of same type and different types by using cluster-based processing system and device using the same

ABSTRACT

A method for performing multi-caching on data sources of a same type and different types by using a cluster-based processing system is provided. The method includes steps of: a big data cluster management device (a) determining whether a result set, corresponding to a query result, is present as first cache data in master or worker nodes, (b) if specific part of the result set is absent, (i) establishing an execution plan (ii) acquiring a first subset in the master or the worker nodes, (iii) acquiring a second subset in none of the master and the worker nodes, and (iv) applying joint operation thereto, and (c) applying data processing operation and output operation thereto thus acquiring the result set as the query result.

FIELD OF THE DISCLOSURE

The present disclosure relates to a method for performing multi-cachingon data sources of a same type and different types using a cluster-basedprocessing system and a device using the same.

BACKGROUND OF THE DISCLOSURE

As IT/OT technologies have developed, the amount of data being collectedis increasing exponentially, and accordingly, researches on creatingadditional values using big data are being conducted. Accordingly,technologies and tools for handling the big data are also beingdeveloped, and among them, big data analysis systems, such as, ApacheSolr, Elastic search, etc., which are provided as open sources, areattracting attentions.

Herein, the Apache Solr and the Elastic search are built on a basis ofthe search engine Lucene. To be specific, the Apache Solr and theElastic search are specialized in certain functions, such as, searchingand storing functions for index-based data or a simple statisticalanalysis function, however, the Apache Solr and the Elastic search arelimited in other functions, such as, modification of the searched data,analysis and aggregation of two or more data sets, etc.

In addition, even in the case of an RDB (relational database) thatprocesses queries related to two or more data sets using a querylanguage such as SQL (structured query language), its performance islimited to a single data source, so it does not support the analysisamong the data sets in data sources of different types.

For this reason, at present, when the analysis among the data sources ofthe different types is required, a separate application for the analysisshould be implemented each time, or the data sets in the data sources ofthe different types should be extracted, transformed and loaded into thesingle data source. However, such procedures have a limitation thatresources are wasted during data duplication processes and thatprocessing the data takes a lot of time.

To solve this problem, the inventor of the present disclosure proposes amethod for performing multi-caching on the data sources of the same typeand the different types by using a cluster-based processing system.

SUMMARY OF THE DISCLOSURE

It is an object of the present disclosure to solve all theaforementioned problems.

It is another object of the present disclosure to provide a method forperforming multi-caching on data sources of a same type and differenttypes by using a cluster-based processing system.

It is still another object of the present disclosure to provide themethod for analysis among the data sources of the different typeswithout moving data sets of the data sources of the different types intoa single database.

In order to accomplish objects above and characteristic effects to bedescribed later of the present disclosure, distinctive structures of thepresent disclosure are described as follows.

In accordance with one aspect of the present disclosure, there isprovided a method for performing multi-caching on data sources of a sametype and different types by using a cluster-based processing system,including steps of: (a) if a user query is acquired, a big data clustermanagement device performing or supporting another device to perform aprocess of determining whether a result set, corresponding to a queryresult of the user query, is present as first cache data in at least onemaster node or at least one worker node, wherein the worker node isincluded in a same system with the master node and communicates with themaster node; (b) the big data cluster management device performing orsupporting another device to perform: if at least specific part of theresult set is determined as present in none of the master node and theworker node, (i) a process of establishing an execution plan tosequentially execute a search operation, a data processing operation,and an output operation based on a result of parsing the user query,(ii) a process of acquiring a first subset, determined as present assecond cache data in the master node or the worker node, by allowing themaster node or the worker node to execute the search operation accordingto the execution plan, wherein the second cache data representselemental sets included in the result set, (iii) a process of acquiringa second subset, determined as present in none of the master node andthe worker node, from the second cache data by allowing at least oneexternal data source to execute the search operation according to theexecution plan, and (iv) a process of applying at least part of a jointoperation included in the search operation to the first subset and thesecond subset according to the execution plan, to thereby acquire aresult of the search operation, wherein the joint operation includes atleast part of a JOIN operation and a UNION operation; and (c) the bigdata cluster management device performing or supporting another deviceto perform a process of applying the data processing operation and theoutput operation to the result of the search operation according to theexecution plan, to thereby acquire and output the result set as thequery result.

As one example, at the process of (iii) in the step of (b), if a firstdata set to an n-th data set, to which at least part of the jointoperation is to be applied, are determined as present in the master nodeor the worker node, the big data cluster management device performs orsupports another device to perform a process of allowing the master nodeor the worker node to apply the joint operation to the first data set tothe n-th data set, to thereby acquire the first subset.

As one example, the big data cluster management device performs orsupports another device to perform a process of updating the secondcache data such that the second cache data include the first subset.

As one example, at the processes of (iii) and (iv) in the step of (b),if at least part of a first data set to an n-th data set, to which atleast part of the joint operation is to be applied, is determined aspresent in none of the master node and the worker node, the big datacluster management device performs or supports another device toperform: (i) a process of allowing the master node or the worker node toexecute the search operation according to the execution plan, to therebyacquire at least one specific data set among the first data set to then-th data set, wherein the specific data set is determined as present inthe master node or the worker node, (ii) a process of allowing theexternal data source to execute the search operation according to theexecution plan, to thereby acquire a remaining data set among the firstdata set to the n-th data set, wherein the remaining data set isdetermined as present in none of the master node and the worker node,and (iii) a process of allowing the master node to apply the jointoperation to the specific data set and the remaining data set, tothereby acquire the second subset.

As one example, the big data cluster management device performs orsupports another device to perform a process of updating the secondcache data such that the second cache data include the remaining dataset and the second subset.

As one example, at the process of (iv) in the step of (b), the big datacluster management device performs or supports another device to performa process of updating the second cache data such that the second cachedata include the second subset.

As one example, the big data cluster management device performs orsupports another device to perform a process of executing the searchoperation and the data processing operation in a file-based manner.

As one example, at the step of (c), the data processing operationincludes at least one of an aggregating operation, a data transformingoperation, a filtering operation, a sorting operation, and a datatruncating operation.

As one example, at the step of (c), the big data cluster managementdevice performs or supports another device to perform a process ofupdating the first cache data such that the first cache data include thequery result.

As one example, at the step of (c), the output operation includes atleast one of a screen displaying operation, a remote RDB (relationaldatabase) storing operation, and a file storing operation.

In accordance with another aspect of the present disclosure, there isprovided a big data cluster management device for performingmulti-caching on data sources of a same type and different types byusing a cluster-based processing system, including: at least one memorythat stores instructions; and at least one processor configured toexecute the instructions to perform or support another device toperform: (I) if a user query is acquired, a process of determiningwhether a result set, corresponding to a query result of the user query,is present as first cache data in at least one master node or at leastone worker node, wherein the worker node is included in a same systemwith the master node and communicates with the master node, (II) if atleast specific part of the result set is determined as present in noneof the master node and the worker node, (i) a process of establishing anexecution plan to sequentially execute a search operation, a dataprocessing operation, and an output operation based on a result ofparsing the user query, (ii) a process of acquiring a first subset,determined as present as second cache data in the master node or theworker node, by allowing the master node or the worker node to executethe search operation according to the execution plan, wherein the secondcache data represents elemental sets included in the result set, (iii) aprocess of acquiring a second subset, determined as present in none ofthe master node and the worker node, from the second cache data byallowing at least one external data source to execute the searchoperation according to the execution plan, and (iv) a process ofapplying at least part of a joint operation included in the searchoperation to the first subset and the second subset according to theexecution plan, to thereby acquire a result of the search operation,wherein the joint operation includes at least part of a JOIN operationand a UNION operation, and (III) a process of applying the dataprocessing operation and the output operation to the result of thesearch operation according to the execution plan, to thereby acquire andoutput the result set as the query result.

As one example, at the process of (iii) in the process of (II), if afirst data set to an n-th data set, to which at least part of the jointoperation is to be applied, are determined as present in the master nodeor the worker node, the processor performs or supports another device toperform a process of allowing the master node or the worker node toapply the joint operation to the first data set to the n-th data set, tothereby acquire the first subset.

As one example, the processor performs or supports another device toperform a process of updating the second cache data such that the secondcache data include the first subset.

As one example, at the processes of (iii) and (iv) in the process of(II), if at least part of a first data set to an n-th data set, to whichat least part of the joint operation is to be applied, is determined aspresent in none of the master node and the worker node, the processorperforms or supports another device to perform: (i) a process ofallowing the master node or the worker node to execute the searchoperation according to the execution plan, to thereby acquire at leastone specific data set among the first data set to the n-th data set,wherein the specific data set is determined as present in the masternode or the worker node, (ii) a process of allowing the external datasource to execute the search operation according to the execution plan,to thereby acquire a remaining data set among the first data set to then-th data set, wherein the remaining data set is determined as presentin none of the master node and the worker node, and (iii) a process ofallowing the master node to apply the joint operation to the specificdata set and the remaining data set, to thereby acquire the secondsubset.

As one example, the processor performs or supports another device toperform a process of updating the second cache data such that the secondcache data include the remaining data set and the second subset.

As one example, at the process of (iv) in the process of (II), theprocessor performs or supports another device to perform a process ofupdating the second cache data such that the second cache data includethe second subset.

As one example, the processor performs or supports another device toperform a process of executing the search operation and the dataprocessing operation in a file-based manner.

As one example, at the process of (III), the data processing operationincludes at least one of an aggregating operation, a data transformingoperation, a filtering operation, a sorting operation, and a datatruncating operation.

As one example, at the process of (III), the processor performs orsupports another device to perform a process of updating the first cachedata such that the first cache data include the query result.

As one example, at the process of (III), the output operation includesat least one of a screen displaying operation, a remote RDB (relationaldatabase) storing operation, and a file storing operation.

BRIEF DESCRIPTION OF THE DRAWINGS

The following drawings to be used to explain example embodiments of thepresent disclosure are only part of example embodiments of the presentdisclosure and other drawings can be obtained based on the drawings bythose skilled in the art of the present disclosure without inventivework.

FIG. 1 is a drawing schematically illustrating a big data clustermanagement device for performing multi-caching on data sources of a sametype and different types by using a cluster-based processing system inaccordance with one example embodiment of the present disclosure.

FIG. 2 is a drawing schematically illustrating a configuration of thecluster-based processing system in accordance with one exampleembodiment of the present disclosure.

FIG. 3 is a drawing schematically illustrating processes of performingthe multi-caching on the data sources of the same type and the differenttypes by using the cluster-based processing system in accordance withone example embodiment of the present disclosure.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

Detailed explanation on the present disclosure to be made below refer toattached drawings and diagrams illustrated as specific embodimentexamples under which the present disclosure may be implemented to makeclear of purposes, technical solutions, and advantages of the presentdisclosure. These embodiments are described in sufficient detail toenable those skilled in the art to practice the disclosure.

Besides, in the detailed description and claims of the presentdisclosure, a term “include” and its variations are not intended toexclude other technical features, additions, components or steps. Otherobjects, benefits and features of the present disclosure will berevealed to those skilled in the art, partially from the specificationand partially from the implementation of the present disclosure. Thefollowing examples and drawings will be provided as examples but theyare not intended to limit the present disclosure.

Moreover, the present disclosure covers all possible combinations ofexample embodiments indicated in this specification. It is to beunderstood that the various embodiments of the present disclosure,although different, are not necessarily mutually exclusive. For example,a particular feature, structure, or characteristic described herein inconnection with one embodiment may be implemented within otherembodiments without departing from the spirit and scope of the presentdisclosure. In addition, it is to be understood that the position orarrangement of individual elements within each disclosed embodiment maybe modified without departing from the spirit and scope of the presentdisclosure. The following detailed description is, therefore, not to betaken in a limiting sense, and the scope of the present disclosure isdefined only by the appended claims, appropriately interpreted, alongwith the full range of equivalents to which the claims are entitled. Inthe drawings, like numerals refer to the same or similar functionalitythroughout the several views.

The headings and abstract of the present disclosure provided herein arefor convenience only and do not limit or interpret the scope or meaningof the embodiments.

To allow those skilled in the art to carry out the present disclosureeasily, the example embodiments of the present disclosure will beexplained in detail by referring to attached diagrams as shown below.

FIG. 1 is a drawing schematically illustrating a big data clustermanagement device for performing multi-caching on data sources of a sametype and different types by using a cluster-based processing system inaccordance with one example embodiment of the present disclosure. Byreferring to FIG. 1 , the big data cluster management device 100 mayinclude a memory 110 for storing instructions to acquire a user query ofa user and output a query result of the user query, and a processor 120for performing processes of acquiring the user query and outputting thequery result, according to the instructions in the memory 110.

Specifically, the big data cluster management device 100 may typicallyachieve a desired system performance by using combinations of at leastone computing device and at least one computer software, e.g., acomputer processor, a memory, a storage, an input device, an outputdevice, or any other conventional computing components, an electroniccommunication device such as a router or a switch, an electronicinformation storage system such as a network-attached storage (NAS)device and a storage area network (SAN) as the computing device and anyinstructions that allow the computing device to function in a specificway as the computer software.

Also, the processors of such devices may include hardware configurationof MPU (Micro Processing Unit) or CPU (Central Processing Unit), cachememory, data bus, etc. Additionally, the computing device may furtherinclude OS (operating system) and software configuration of applicationsthat achieve specific purposes.

Such description of the computing device does not exclude an integrateddevice including any combination of a processor, a memory, a medium, orany other computing components for implementing the present disclosure.

A configuration and functions of the cluster-based processing systemincluding the big data cluster management device 100 in accordance withone example embodiment of the present disclosure are described byreferring to FIG. 2 .

FIG. 2 is a drawing schematically illustrating a configuration of thecluster-based processing system 1000 including the big data clustermanagement device 100 in accordance with one example embodiment of thepresent disclosure.

Herein, the cluster-based processing system 1000 may be connected withat least one external data source 200, and may include the big datacluster management device 100, one or more master nodes 150, and one ormore worker nodes 160.

In detail, a user device for acquiring the user query may allow accessto the big data cluster management device 100 included in thecluster-based processing system 1000. Also, the external data source 200may be connected with the master nodes 150 included in the cluster-basedprocessing system 1000. Further, the big data cluster management device100 may be connected with each of the master nodes 150 and the workernodes 160.

Herein, the big data cluster management device 100 may perform orsupport another device to perform (i) a process of acquiring the userquery from the user device and (ii) a process of allowing execution ofoperations included in the user query on data and/or cache data storedin the master nodes 150, the worker nodes 160, and the external datasource 200. Further, the cache data may include (1) first cache dataamong which a result set, corresponding to a query result of the userquery per se, may or may not be included and (2) and second cache datamerely representing elemental sets included in the result set. Also, thebig data cluster management device 100 may be configured as independentof the master nodes 150, but the scope of the present disclosure is notlimited thereto, and the big data cluster management device 100 mayinclude the master nodes 150.

Meanwhile, the master nodes 150 may be connected with the external datasource 200 and the worker nodes 160 directly or indirectly. Also, themaster nodes 150 may store the first cache data and the second cachedata, and may execute the operations included in the user query on thedata and/or the cache data stored in the master nodes 150, the workernodes 160, and the external data source 200. Herein, the worker nodes160 may include the cache data, i.e., the first cache data and/or thesecond cache data. Also, the cluster-based processing system 1000 may beconfigured to include two or more of the master nodes 150 so thatstructural redundancy of the master nodes 150 is established.

Meanwhile, the worker nodes 160 may store the first cache data and thesecond cache data, and may execute the operations included in the userquery on the cache data stored in the worker nodes 160. Herein, thecache data may include the first cache data and/or the second cachedata. Also, the cluster-based processing system 1000 may include two ormore of the worker nodes 160 so that a cluster comprised of the workernodes 160 is implemented.

The configuration and the functions of the cluster-based processingsystem 1000 are described as above. Processes of performing themulti-caching on the data sources of the same type and the differenttypes by using the cluster-based processing system 1000 are described byreferring to FIG. 3 as follows.

First, if the user query is acquired at a step of S301, the big datacluster management device 100 may perform or support another device toperform a process of determining whether the result set, correspondingto the query result, is present as the first cache data in at least oneof the master nodes 150 or at least one of the worker nodes 160 at astep of S302. Herein, the user query may be parsed for the purpose ofdetermining whether the result set is present as the first cache data inat least one of the master nodes 150 and/or at least one of the workernodes 160.

If the result set is determined as present as the first cache data in atleast one of the master nodes 150 or at least one of the worker nodes160, then the big data cluster management device 100 may skip the stepsfrom S303 to S307, and may acquire the result set from the first cachedata as the query result.

If at least specific part of the result set is determined as present innone of the master nodes 150 and the worker nodes 160, then the big datacluster management device 100 may perform or support another device toperform (i) a process of parsing the user query at a step of S303 and(ii) a process of establishing an execution plan to execute the userquery based on a result of parsing the user query at a step of S304.

Herein, the execution plan may be established to sequentially execute asearch operation, a data processing operation, and an output operation,included in the user query.

Then, the big data cluster management device 100 may perform or supportanother device to perform a process of acquiring a first subsetdetermined as present as second cache data in the master nodes 150 orthe worker nodes 160 by allowing the master nodes 150 or the workernodes 160 to execute the search operation according to the executionplan, at a step of S305. Herein, the second cache data may represent theelemental sets included in the result set.

If a first data set to an n-th data set, to which at least part of ajoint operation included in the execution plan is to be applied, isdetermined as present as the first subset in the master nodes 150 or theworker nodes 160, then the big data cluster management device 100 mayperform or support another device to perform a process of allowing themaster nodes 150 or the worker nodes 160 to apply the joint operation tothe first data set to the n-th data set, to thereby acquire the firstsubset, instead of acquiring the first data set to the n-th data set asa whole as the first subset. Herein, the joint operation may include atleast part of a JOIN operation and a UNION operation. Also, the big datacluster management device 100 may perform or support another device toperform a process of updating the second cache data such that the secondcache data include the first subset.

Then, the big data cluster management device 100 may perform or supportanother device to perform a process of acquiring a second subset byallowing the external data source 200 to execute the search operationaccording to the execution plan, at a step of S306. Herein, the secondcache data may be comprised of the first subset and the second subset.And also, the first subset can be acquired from the master nodes 150 andthe worker nodes 160. Further, the second subset cannot be acquired fromthe master nodes 150 and the worker nodes 160, that is, the secondsubset cannot be acquired by applying the joint operation to the firstsubset and may be acquired from the external data source.

Herein, if at least part of the first data set to the n-th data set,required for acquiring the second subset, is determined as present inthe master nodes 150 and/or the worker nodes 160, then the big datacluster management device 100 may perform or support another device toperform (i) a process of allowing the master nodes 150 and/or the workernodes 160 to execute the search operation according to the executionplan, to thereby acquire at least one specific data set among the firstdata set to the n-th data set and (ii) a process of allowing theexternal data source 200 to execute the search operation according tothe execution plan, to thereby acquire a remaining data set among thefirst data set to the n-th data set, where the remaining data set isdetermined as present in none of the master nodes 150 and the workernodes 160. Then the big data cluster management device 100 may performor support another device to perform a process of applying the jointoperation to the specific data set and the remaining data set, tothereby acquire the second subset. Herein, the big data clustermanagement device 100 may perform or support another device to perform aprocess of updating the second cache data such that the second cachedata include the remaining data set and the second subset.

Then, the big data cluster management device 100 may perform or supportanother device to perform (i) a process of applying at least part of thejoint operation, included in the search operation, to the first subsetand the second subset according to the execution plan, to therebyacquire a result of the search operation, and (ii) a process of applyingthe data processing operation and the output operation to the result ofthe search operation according to the execution plan, to thereby acquireand output the result set as the query result, at steps of S307 andS308.

Herein, the big data cluster management device 100 may perform orsupport another device to perform a process of applying the dataprocessing operation and the output operation in a file-based manner,that is, in an on-disk manner instead of an in-memory manner, to therebyprevent problems due to lack of memory during big data analysis.

Also, the data processing operation may include at least one of anaggregating operation, a data transforming operation, a filteringoperation, a sorting operation, and a data truncating operation, but thescope of the present disclosure is not limited thereto. Further, theoutput operation may include at least one of a screen displayingoperation, a remote RDB (relational database) storing operation, and afile storing operation, but the scope of the present disclosure is notlimited thereto.

Also, the big data cluster management device 100 may perform or supportanother device to perform a process of updating the first cache datasuch that the first cache data include the query result.

The present disclosure has an effect of providing a method formulti-caching on the data sources of the same type and the differenttypes using the cluster-based processing system.

The present disclosure has another effect of providing a method foranalysis among the data sources of the different types without movingdata sets of the data sources of the different types into a singledatabase.

The embodiments of the present disclosure as explained above can beimplemented in a form of executable program command through a variety ofcomputer means recordable to computer readable media. The computerreadable media may include solely or in combination, program commands,data files, and data structures. The program commands recorded to themedia may be components specially designed for the present disclosure ormay be usable to a skilled human in a field of computer software.Computer readable media include magnetic media such as hard disk, floppydisk, and magnetic tape, optical media such as CD-ROM and DVD,magneto-optical media such as floptical disk and hardware devices suchas ROM, RAM, and flash memory specially designed to store and carry outprogram commands. Program commands include not only a machine languagecode made by a complier but also a high level code that can be used byan interpreter etc., which may be executed by a computer. Theaforementioned hardware device can work as more than a software moduleto perform the action of the present disclosure and vice versa.

As seen above, the present disclosure has been explained by specificmatters such as detailed components, limited embodiments, and drawings.They have been provided only to help more general understanding of thepresent disclosure. It, however, will be understood by those skilled inthe art that various changes and modification may be made from thedescription without departing from the spirit and scope of thedisclosure as defined in the following claims.

Accordingly, the spirit of the present disclosure must not be confinedto the explained embodiments, and the following patent claims as well aseverything including variations equal or equivalent to the patent claimspertain to the category of the spirit of the present disclosure.

What is claimed is:
 1. A method for performing multi-caching on datasources of different types by using a cluster-based processing systemconnected to an external data source, the cluster-based processingsystem including at least one master node and at least one worker node,the method performed by a data cluster management device of thecluster-based processing system and comprising: acquiring a user querythat comprises a plurality of query elements; determining whether or nota complete response to the plurality of query elements pre-exists asjoined data within a first cache of either the master node or the workernode; upon determining that a complete response to the user query doesnot pre-exist as joined data within the first cache of either the masternode or the worker node, parsing the user query into the plurality ofquery elements; determining whether or not individual responses to eachthe plurality of query elements exists as individual data elementswithin a second cache of either the master node or the worker node,thereby producing a determination result; the determination resultindicating that: first joinable data corresponding to a first subset ofthe plurality of query elements exists as individual data elementswithin the second cache of either the master node or the worker node,first unjoined data corresponding to a second subset of the plurality ofquery elements exists as an individual data element within the secondcache of either the master node or the worker node, and second unjoineddata corresponding to the second subset of the plurality of queryelements does not exist within the second cache of either the masternode or the worker node; based on determining that the second unjoineddata corresponding to the second subset of the plurality of queryelements does not exist within the second cache of either the masternode or the worker node, determining that the second unjoined datacorresponding to the second subset of the plurality of query elementsdoes exist as an individual data element within the external datasource; performing a series of JOIN operations to create the completeresponse to the plurality of query elements, the JOIN operationscomprising: joining the first joinable data to form first joined data,joining the first unjoined data within the second cache of either themaster node or the worker node and second unjoined data within theexternal data source to form second joined data, and joining the firstjoined data with the second joined data to produce the complete responseto the plurality of query elements; and reporting the complete responseto the plurality of query elements as a reply to the user query.
 2. Themethod of claim 1, further comprising: copying the second unjoined datawithin the external data source to the second cache of either the masternode or the worker node.
 3. The method of claim 1, wherein the firstsubset of the plurality of query elements corresponds to a first type ofdata and the second subset of the plurality of query elementscorresponds to a second type of data different from the first type ofdata.
 4. The method of claim 1, further comprising: updating the firstcache data such that the first cache data includes the complete responseto the plurality of query elements acquired from joining the firstjoined data with the second joined data.
 5. A cluster management devicefor performing multi-caching on data sources of different types by usinga cluster-based processing system connected to an external data source,the cluster-based processing system including a master node and a workernode, the cluster management device comprising: at least one memory thatstores instructions; and at least one processor configured to executethe instructions to perform or support another device to perform aprocess comprising: acquiring a user query that comprises a plurality ofquery elements; determining whether or not a complete response to theplurality of query elements pre-exists as joined data within a firstcache of either the master node or the worker node; upon determiningthat a complete response to the user query does not pre-exist as joineddata within the first cache of either the master node or the workernode, parsing the user query into the plurality of query elements;determining whether or not individual responses to each the plurality ofquery elements exists as individual data elements within a second cacheof either the master node or the worker node, thereby producing adetermination result; the determination result indicating that: firstjoinable data corresponding to a first subset of the plurality of queryelements exists as individual data elements within the second cache ofeither the master node or the worker node, first unjoined datacorresponding to a second subset of the plurality of query elementsexists as an individual data element within the second cache of eitherthe master node or the worker node, and second unjoined datacorresponding to the second subset of the plurality of query elementsdoes not exist within the second cache of either the master node or theworker node; based on determining that the second unjoined datacorresponding to the second subset of the plurality of query elementsdoes not exist within the second cache of either the master node or theworker node, determining that the second unjoined data corresponding tothe second subset of the plurality of query elements does exist as anindividual data element within the external data source; performing aseries of JOIN operations to create the complete response to theplurality of query elements, the JOIN operations comprising: joining thefirst joinable data to form first joined data, joining the firstunjoined data within the second cache of either the master node or theworker node and second unjoined data within the external data source toform second joined data, and joining the first joined data with thesecond joined data to produce the complete response to the plurality ofquery elements; and reporting the complete response to the plurality ofquery elements as a reply to the user query.
 6. The cluster managementdevice of claim 5, wherein the process further comprises: copying thesecond unjoined data within the external data source to the second cacheof either the master node or the worker node.
 7. The cluster managementdevice of claim 5, wherein the first subset of the plurality of queryelements corresponds to a first type of data and the second subset ofthe plurality of query elements corresponds to a second type of datadifferent from the first type of data.
 8. The cluster management deviceof claim 5, wherein the process further comprises: updating the firstcache data such that the first cache data includes the complete responseto the plurality of query elements acquired from joining the firstjoined data with the second joined data.